In this study, two correlated stock pairs are analyzed in BIST30. First, the basic pairs trading strategy is examined using constant variance assumption. Next, advanced pairs trading strategy using time series analysis is observed in these stocks.
Here is the plot of GARAN and AKBNK from 2018 to 2020:
garan_akbnk_data <- wide_data %>%
select(c(timestamp, GARAN, AKBNK)) %>%
filter(timestamp %within% interval("2018-01-01", "2020-01-01"))
garan_akbnk_data %>%
pivot_longer(cols = c("GARAN", "AKBNK"), names_to = "Stock", values_to = "Price") %>%
ggplot() +
geom_line(aes(x = timestamp, y = Price, color = Stock)) +
labs(title = "GARAN and AKBNK Stocks from 2018 to 2020")AKBNK and GARAN show a similar trend over time. To model their relationship, a linear regression model between GARAN and AKBNK is built.
##
## Call:
## lm(formula = GARAN ~ AKBNK, data = garan_akbnk_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.99924 -0.18489 0.00207 0.20937 0.90240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.031388 0.029689 1.057 0.29
## AKBNK 1.341477 0.004994 268.616 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3044 on 4971 degrees of freedom
## Multiple R-squared: 0.9355, Adjusted R-squared: 0.9355
## F-statistic: 7.215e+04 on 1 and 4971 DF, p-value: < 2.2e-16
According to the statistics, the linear regression of GARAN with respect to AKBNK is statistically significant.
The residuals of the model are:
data.frame(index = 1:length(model1$residuals), residuals = model1$residuals) %>%
ggplot() +
geom_point(aes(x = index, y = residuals)) +
labs(title= "Residuals of the linear regression model of GARAN and AKBNK")We can plot an X-bar control chart to the residuals to spot the outliers. Here is the X-bar chart:
The standard deviation of the residuals is 0.304352. The lower and upper 2 sigma limits are: -0.608704, 0.608704, respectively. The red points indicate the residuals that are outside the limits. According to the pairs trading strategy, the stocks should be traded when the points lie beyond limits. When the point is below LCL GARAN is sold and AKBNK is bought, and when it is above UCL, the opposite is performed. So, with this strategy, the profit becomes:
garan_akbnk_data$SELL_GARAN_BUY_AKBNK <- model1$residuals < qcc1$limits[,"LCL"]
garan_akbnk_data$SELL_AKBNK_BUY_GARAN <- model1$residuals > qcc1$limits[,"UCL"]
income <-sum(garan_akbnk_data %>%
filter(SELL_GARAN_BUY_AKBNK) %>%
select(GARAN)) +
sum(garan_akbnk_data %>%
filter(SELL_AKBNK_BUY_GARAN) %>% select(AKBNK))
loss <-sum(garan_akbnk_data %>%
filter(SELL_GARAN_BUY_AKBNK) %>%
select(AKBNK)) +
sum(garan_akbnk_data %>%
filter(SELL_AKBNK_BUY_GARAN) %>%
select(GARAN))
income-loss## [1] 30.3665
Now, we examine YKBNK and ISCTR. Here is the plot of the stocks from 2018 to 2020.
ykbnk_isctr_data <- wide_data %>%
select(c(timestamp, YKBNK, ISCTR)) %>%
filter(timestamp %within% interval("2018-01-01", "2020-01-01"))
ykbnk_isctr_data %>%
pivot_longer(cols = c("YKBNK", "ISCTR"), names_to = "Stock", values_to = "Price") %>%
ggplot() +
geom_line(aes(x = timestamp, y = Price, color = Stock)) +
labs(title = "YKBNK and ISCTR Stocks from 2018 to 2020")
Next, we build a linear regression model to predict ISCTR stocks with
YKBNK:
##
## Call:
## lm(formula = ISCTR ~ YKBNK, data = ykbnk_isctr_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32547 -0.09754 -0.00799 0.08306 0.35670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.313005 0.009571 32.7 <2e-16 ***
## YKBNK 0.954459 0.004744 201.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1255 on 4971 degrees of freedom
## Multiple R-squared: 0.8906, Adjusted R-squared: 0.8906
## F-statistic: 4.048e+04 on 1 and 4971 DF, p-value: < 2.2e-16
The linear regression model is statistically significant.
Like the GARAN-AKBNK case, we can check the residuals for the pairs trading strategy. Here is the plot of the residuals:
data.frame(index = 1:length(model2$residuals), residuals = model2$residuals) %>%
ggplot() +
geom_point(aes(x = index, y = residuals)) +
labs(title= "Residuals of the linear regression model of YKBNK and ISCTR")When we plot these residuals on an X-bar chart, we have the following:
The standard deviation of the residuals is 0.1254683. The lower and
upper 2 sigma limits are: -0.2509366, 0.2509366, respectively. The red
points indicate the residuals that are outside the limits. According to
the pairs trading strategy, we should sell ISCTR - buy YKBNK when the
residuals are below LCL, and buy ISCTR - sell YKBNK when the residuals
are above UCL.
Here is the profit associated with this strategy:
ykbnk_isctr_data$SELL_ISCTR_BUY_YKBNK <- model2$residuals < qcc2$limits[,"LCL"]
ykbnk_isctr_data$SELL_YKBNK_BUY_ISCTR <- model2$residuals > qcc2$limits[,"UCL"]
income <-sum(ykbnk_isctr_data %>%
filter(SELL_ISCTR_BUY_YKBNK) %>%
select(ISCTR)) +
sum(ykbnk_isctr_data %>%
filter(SELL_YKBNK_BUY_ISCTR) %>% select(YKBNK))
loss <-sum(ykbnk_isctr_data %>%
filter(SELL_ISCTR_BUY_YKBNK) %>%
select(YKBNK)) +
sum(ykbnk_isctr_data %>%
filter(SELL_YKBNK_BUY_ISCTR) %>%
select(ISCTR))
income-loss## [1] -80.3799
This time, the pairs trading strategy did not give us a positive profit. This is possible, because the market dynamics cannot be modeled perfectly.
To sum up, this strategy uses linear regression modeling and identify highly correlated stock pairs. Then control limits are determined for trading with the assumption of constant variance. In short-term, this strategy may be efficient and bring profit. To do that, control chart send signals for initiate tradings. On the other hand, this assumption may not hold in all conditions and may result wrong or inexact control limits. Because of this strategy depends on that past correlations continue in the future, it may not be like that in the future.
In this part, advanced time series analysis is conducted to model the residuals. First, we check the autocorrelation of the residuals for GARAN and AKBNK:
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 4906.4, df = 10, p-value < 2.2e-16
The residual are highly autocorrelated, which is not desired in the linear regression model.
We can improve the model by adding lagged values. We introduce GARAN’s lag 1 value to the model:
## # A tibble: 4,973 × 6
## timestamp GARAN AKBNK SELL_GARAN_BUY_AKBNK SELL_AKBNK_BUY_GARAN
## <dttm> <dbl> <dbl> <lgl> <lgl>
## 1 2018-01-02 06:00:00 9.20 6.95 FALSE FALSE
## 2 2018-01-02 07:00:00 9.32 7.06 FALSE FALSE
## 3 2018-01-02 08:00:00 9.34 7.10 FALSE FALSE
## 4 2018-01-02 09:00:00 9.32 7.08 FALSE FALSE
## 5 2018-01-02 10:00:00 9.33 7.10 FALSE FALSE
## 6 2018-01-02 11:00:00 9.34 7.14 FALSE FALSE
## 7 2018-01-02 12:00:00 9.32 7.12 FALSE FALSE
## 8 2018-01-02 13:00:00 9.34 7.12 FALSE FALSE
## 9 2018-01-02 14:00:00 9.33 7.12 FALSE FALSE
## 10 2018-01-02 15:00:00 9.32 7.11 FALSE FALSE
## # ℹ 4,963 more rows
## # ℹ 1 more variable: GARAN_LAG1 <dbl>
##
## Call:
## lm(formula = GARAN ~ GARAN_LAG1 + AKBNK, data = garan_akbnk_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45535 -0.02326 0.00047 0.02436 0.50531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.006367 0.005414 -1.176 0.24
## GARAN_LAG1 0.972249 0.002557 380.294 <2e-16 ***
## AKBNK 0.038482 0.003545 10.854 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05548 on 4969 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9979, Adjusted R-squared: 0.9979
## F-statistic: 1.158e+06 on 2 and 4969 DF, p-value: < 2.2e-16
The model is statistically significant. We can check the residuals:
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 71.577, df = 10, p-value = 2.197e-11
The introduction of the lagged value decreased the autocorrelation of the residuals. We can continue with this model.
Next, we plot the X-bar chart of the new model. This time, we use 3 sigmas as the limit, because 2 sigma limits cause too many false alarms.
We follow the same procedure to calculate the profit associated with the pairs trade. The profit is:
garan_akbnk_data$SELL_GARAN_BUY_AKBNK <- c(FALSE, model3$residuals < qcc3$limits[,"LCL"])
garan_akbnk_data$SELL_AKBNK_BUY_GARAN <- c(FALSE, model3$residuals > qcc3$limits[,"UCL"])
income <-sum(garan_akbnk_data %>%
filter(SELL_GARAN_BUY_AKBNK) %>%
select(GARAN)) +
sum(garan_akbnk_data %>%
filter(SELL_AKBNK_BUY_GARAN) %>% select(AKBNK))
loss <-sum(garan_akbnk_data %>%
filter(SELL_GARAN_BUY_AKBNK) %>%
select(AKBNK)) +
sum(garan_akbnk_data %>%
filter(SELL_AKBNK_BUY_GARAN) %>%
select(GARAN))
income-loss## [1] 24.5917
With the pairs trade, we obtain a positive profit.
First of all, we check the autocorrelation of YKBNK and ISCTR in the model used in Task 1:
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 4916.7, df = 10, p-value < 2.2e-16
The residuals are highly autocorrelated.
We can improve the model by adding lag 1 of ISCTR:
## # A tibble: 4,973 × 6
## timestamp YKBNK ISCTR SELL_ISCTR_BUY_YKBNK SELL_YKBNK_BUY_ISCTR
## <dttm> <dbl> <dbl> <lgl> <lgl>
## 1 2018-01-02 06:00:00 2.45 2.63 FALSE FALSE
## 2 2018-01-02 07:00:00 2.47 2.64 FALSE FALSE
## 3 2018-01-02 08:00:00 2.48 2.64 FALSE FALSE
## 4 2018-01-02 09:00:00 2.48 2.64 FALSE FALSE
## 5 2018-01-02 10:00:00 2.48 2.64 FALSE FALSE
## 6 2018-01-02 11:00:00 2.50 2.67 FALSE FALSE
## 7 2018-01-02 12:00:00 2.49 2.66 FALSE FALSE
## 8 2018-01-02 13:00:00 2.49 2.66 FALSE FALSE
## 9 2018-01-02 14:00:00 2.49 2.66 FALSE FALSE
## 10 2018-01-02 15:00:00 2.49 2.66 FALSE FALSE
## # ℹ 4,963 more rows
## # ℹ 1 more variable: ISCTR_LAG1 <dbl>
##
## Call:
## lm(formula = ISCTR ~ YKBNK + ISCTR_LAG1, data = ykbnk_isctr_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.122347 -0.007018 -0.000023 0.006982 0.147716
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.002666 0.001343 1.986 0.0471 *
## YKBNK 0.007876 0.001825 4.316 1.62e-05 ***
## ISCTR_LAG1 0.991699 0.001804 549.588 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01597 on 4969 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9982, Adjusted R-squared: 0.9982
## F-statistic: 1.401e+06 on 2 and 4969 DF, p-value: < 2.2e-16
The model is statistically significant and has a better adjusted R-squared value than the previous model. We continue by checking the residuals:
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 21.057, df = 10, p-value = 0.0207
The autocorrelation problem in the first model decreased significantly. We can use this model for detecting the pairs trade dates.
We perform the same steps and calculate the profit:
ykbnk_isctr_data$SELL_ISCTR_BUY_YKBNK <- c(FALSE, model5$residuals < qcc4$limits[,"LCL"])
ykbnk_isctr_data$SELL_YKBNK_BUY_ISCTR <- c(FALSE, model5$residuals > qcc4$limits[,"UCL"])
income <-sum(ykbnk_isctr_data %>%
filter(SELL_ISCTR_BUY_YKBNK) %>%
select(ISCTR)) +
sum(ykbnk_isctr_data %>%
filter(SELL_YKBNK_BUY_ISCTR) %>% select(YKBNK))
loss <-sum(ykbnk_isctr_data %>%
filter(SELL_ISCTR_BUY_YKBNK) %>%
select(YKBNK)) +
sum(ykbnk_isctr_data %>%
filter(SELL_YKBNK_BUY_ISCTR) %>%
select(ISCTR))
income-loss## [1] -2.6059
This model also gave a negative profit, but the overall loss is less than the loss calculated in Task 1.
Advanced Pairs Trading Strategy using Time Series Analysis is a more dynamic strategy using revised control limits with residuals. It reacts changes in market and evolved relations of stock pairs. Also usage of time series results less risky signals for trading. However, in this strategy, if there is not much data, overfitting may occur. In our model, we have used more data to escape this situation.
There are different benefits of using both of these strategies. We should choose proper strategy depending on conditions. In short-term, first strategy might be more profitable however in the long-term, due to lots of changes in market conditions, using second strategy would be more logical. In conclusion, the second method with time series analysis offers the possibility for improved accuracy and adaptability while the first strategy offers a simple approach. Both tactics, however, have disadvantages and must be carefully considered in light of a number of considerations in order to be used successfully.